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REMARKS 

Claims 1 1-15, 21, and 22 were pending in this application. Claims 1 1 and 12 are 
amended herein. It is believed that no new matter has been added. No claim has been allowed. 
Claims 11-15, 21 and 22 are currently pending. 

Formal Matters 

Applicants gratefully acknowledge the entry of the amendments requested in Paper No. 
11, filed May 5, 2003 and the entry of the updated priority information. 

The Office objects to the specification stating that the Table 4 was not deleted because 
the amendment did not request such a deletion. The Amendment filed May 5, 2003 requests the 
deletion of this paragraph at page 3 under Amendments to the Specification . 

The Office further objects to the specification stating that Tables 1 and 2 should also be 
deleted. The specification is amended herein to delete Tables 1 and 2. 

The Office objects to the title stating that the title is not aptly descriptive. The title is 
amended herein. 

Applicants gratefully acknowledge the withdrawal of the rejections under 
35 U.S.C. § 1 12, second paragraph. 

Claims 1 1 and 12 are amended herein in view of the suggestions made by in the 
Examiner Interview of November 24, 2003 to further clarify the claimed compositions. 

In view of the above. Applicants respectfully submit that the objections are overcome 
and request the v^thdrawal of the objections. 

Summary of Examiner Interview 

Applicants greatly appreciate the time and effort of Examiner O'Hara and Supervisory 
Patent Examiner ("SPE") Spector in the Examiner Interview of November 24, 2003. It was very 
productive. The summary of the interview is as follows: Examiner O'Hara and SPE Spector agreed 
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that the specification fulfilled the utility requirement under 35 U.S.C. § 101, and thus the written 
description under 35 U.S.C. § 1 12, first paragraph, for the nucleotide sequences disclosed in the 
application. However, SPE Spector indicated that the mRNA levels demonstrated by in situ 
hybridization and PGR are not necessarily sufficient to support utility for a protein or a compound 
binding the protein, and therefore such data may not fiilfiU the utility and related written description 
requirements for a protein and a binding compoimd for the protein. We discussed the publications 
in the journals Electrophoresis (Exhibit A) and Molecular Cell Biology (Exhibit B) as they relate to 
the proposition that mRNA expression does not necessarily correspond with protein expression. 
Applicants pointed out the inconsistencies between mRNA and protein expression lie only in a 
subset of mRNA transcripts of less than 10 transcript copies per cell. SPE Spector indicated that 
such an argument would need to made of record for proper consideration. SPE Spector also 
suggested minor amendments to the current claims, asking that the terms "binding compoimd" and 
"antibody binding site" be clarified. 

Rejection Under 35 U>S.C> § 101 and § 112, first paragraph 

Claims 1 1-15, 21 and 22 are rejected under 35 U.S.C. §§ 101 and 1 12, first paragraph as 
allegedly failing to provide either a specific and substantial asserted utility or a well established 
utility. According to the Examiner Interview of November 24, 2003, the remaining basis supporting 
this rejection is in the assertion that the mRNA data disclosed in the specification does not provide a 
specific and substantial utility or a well established utility because mRNA expression does not 
necessarily predict protein expression. SPE Spector indicated that the remaining grounds asserted 
in the Action dated August 8, 2003 were either moot or secondary to the issue of whether mRNA 
expression predicts or correlates with protein expression. Therefore, the substance of this response 
addresses the sole issue of whether mRNA expression predicts or correlates with protein expression. 

Applicants traverse this rejection. 

Applicants respectfixlly submit that there is no evidence that mRNA expression detected 
via traditional means, e.g., in situ hybridization, does not predict or correlate v^th protein 
expression. As discussed in the interview, mRNA detection technology has improved dramatically 
allowing detection of mRNA that is expressed as a single copy per cell in, e.g., the SAGE method. 
See, e.g.. Exhibit B, at page 1728, first column (discussing the sensitivity of the SAGE method as 
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permitting the detection of a single copy of mRNA per cell 72% of the time). Thus, a new class of 
mRNA transcripts is available for analysis, namely mRNA transcripts expressed at an extremely 
low copy number of 10 or less per cell. Not surprisingly, this class of rare transcripts does not 
appear to have all the properties associated with mRNA transcripts expressed in higher copy 
number. However, this deviation does not diminish the well accepted view that mRNA expression 
correlates with protein expression. A careful review of two publications addressing this issue reveal 
the critical distinctions between mRNA transcripts detected only by the SAGE method and those 
transcripts detectably via less sensitive, traditional methods such as in situ hybridization. 

In Haynes et al.. Electrophoresis 19:1862-71 (1998) (Exhibit A) (hereinafter "the 
Haynes review"), the authors conclude that protein levels in a cell caimot be accurately predicted 
from the level of the corresponding mRNA transcript, a potentially ground-breaking conclusion that 
challenges years of established dogma regarding mRNA and protein correlation. See the Haynes 
review, at page 1 863, T[2. 1 (the authors conclude that there is "a general trend but no strong 
correlation between protein and transcript levels."). To support this conclusion, the Haynes review 
analyzes a group of 87 genes in yeast and presents a summary of his data generated using the SAGE 
method as a means to detect mRNA expression. However, the data in Haynes supporting this 
ground-breaking conclusion is incomplete. In particular, the Haynes review lacks several 
substantive pieces of data needed to support its broad conclusions, namely a statistical analysis of 
the correlation between mRNA and protein expression in Figure 1 . As review articles are typically 
not subject to a rigorous peer review process, such articles are not necessarily required to complete a 
rigorous analysis of the data presented. Therefore, from the perspective of the person of ordinary 
skill in the art the assertions in the Haynes review would be considered speculative in the absence of 
more rigorous analysis. 

A more rigorous and complete analysis of the data in the Haynes review definitively 
demonstrates that the conclusions made in the Haynes review were incorrect if applied to mRNA 
transcripts other than those expressed at a very low copy number. In other words, in Gygi, et al., 
MoL Cell Biol. 19(3) : 1720-30 (1999) (Exhibit B) (hereinafter "the MCB paper"), a thorough 
analysis of the data supports the traditional dogma regarding the positive correlation between 
mRNA and protein expression for highly expressed mRNA transcripts. More specifically, after 
examining more genes, i.e., 106, with a strikingly similar expression profile to that profile first 
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reported in the Haynes review, the authors conclude in the MCB paper that there was "a general 
trend of increased protein levels resulting from increased mRNA levels." See Exhibit B, at page 
1726. In fact, the correlation coefficient for this general trend was 0.935 . See Exhibit B, Figure 5. 
Thus, with a rigorous statistical analysis of the data, the correlation between mRNA levels and 
protein expression was readily apparent. The variance between transcript level and protein 
expression asserted in the Haynes review is observed largely within the population of transcripts 
present at 10 copies per cell or less. In other words, the conclusion asserted by the authors in the 
Haynes review applies only to a subset of the mRNA transcripts examined, i.e., those at a copy 
number of 10 or less per cell. 

The conclusions of the Haynes review (and the MCB paper) regarding the subset of low 
copy mRNA do not invalidate the conclusion of high protein expression in the cells with high 
mRNA expression based on the evidence disclosed in the instant specification. First, Applicants 
note that both the Haynes review and the MCB paper indicate that the greatest variance in the 
o correlation between mRNA and protein expression occurs in the subset of mRNA transcripts at a 
low copy number, /.e., 10 copies or less per cell, a subset of mRNA transcripts at or below the 
detection limit of the in situ hybridization analysis employed in the instant specification. The 
detection limits of in situ hybridization analysis make it difficult to routinely detect mRNA 
transcripts present at 10 or less copies per cell. Thus, it is questionable whether this subset of 
transcripts is even detectable in the in situ hybridization analysis employed in the instant 
application. Second, assuming arguendo that in situ hybridization analysis permits the detection of 
such a rare transcript, the Haynes review and the MCB paper indicate that the protein expression 
resulting from this low mRNA copy subset is still significantly below that of the protein expression 
from the high mRNA copy subset. For example. Figure 5 in the MCB paper indicates that a mRNA 
transcript of -45 copies/cell results in a protein abundance of -50-1 00,000 copies/cell. On the other 
hand, a mRNA transcript of -200 copies/cell results in a protein abundance of -375,000 copies/cell. 
Such differences are not insubstantial and are readily detectable using well known and routine 
methods in the art. Therefore, while there may be only an inexact correlation between mRNA and 
protein expression seen for the low mRNA copy subset, the range in protein expression resulting 
from this subset is still well below that observed within the high mRNA copy subset. In other 
words, the highest protein levels resulting from a low mRNA copy transcript is still significantly 
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lower than the detectably higher protein level resulting from the more abundant mRNA transcripts. 
Third, neither the Haynes review or the MCB paper provides evidence or suggests that a high 
mRNA copy transcript does not result in high protein expression. For all of these reasons, the 
conclusion that high mRNA expression correlates with high protein expression remains credible and 
is supported by the Haynes review and the MCB paper. 

In view of the above. Applicants respectfully submit that the data disclosed in the instant 
application supports the prediction of protein expression in inflammatory and allergic responses, a 
specific, substantial, and credible utility. Therefore, the basis for this rejection may be withdrawn. 

Priority 

The Office has denied the benefit of priority under 35 U.S.C. § 1 19(e) from an earlier 
application, alleging that the prior application fails to meet the requirements of 35 U.S.C. § 1 12, 
first paragraph in view of the earlier application's failure to provide either a specific and substantial 
utility or a well established utility. Applicants traverse this refusal to recognize the priority claim of 
the instant application. 

As stated in the Examiner Summary, SPE Spector and Examiner O'Hara agreed that the 
utility requirements under 35 U.S.C. § 101 was met for the disclosed nucleotide sequences at a 
minimum in the instant application in view of the disclosed in situ hybridization data. Because the 
earlier filed provisional applications also disclose the in situ hybridization data for the RANKL 
expression, the disclosures of at least the provisional application Serial No.: 60/099,999, filed 
September 11, 1998 fulfills the utility and related written description requirements under 35 U.S.C. 
§§ 101 and 112, respectively. ^Therefore, Applicants should receive the benefit of priority under 35 
U.S.C. § 1 19(e) for these application, rendering the earliest priority date for the claimed 
compositions at least September 11, 1998 . 

In view of the above. Applicants request that the Office acknowledge that the instant 
application is entitled to the earlier effective filing date of September 11, 1998. 



* The in situ hybridization data can be found at page 11-12 of the Application Serial No. 60/099,999, filed September 
11, 1998 per the electronic copy of the filed application currently available to Applicants. 
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Rejection Under 35 U.S.C. § 102 (e) 

Claims 11-14, 21 and 22 are rejected under 35 U.S.C. § 102 (e) as allegedly being 
anticipated by Goddard et al., U.S. Published Application 20030092044, effective filing date April 
12, 1999 for reasons of record. Applicants traverse this rejection. 

Applicants respectfully submit that Goddard is not a proper reference under 
35 U.S.C. § 102 (e). Goddard is cited as a reference under 35 U.S.C. § 102 (e) in view of the 
nucleotide sequence disclosed therein. Because the in situ hybridization and PCR data disclosed in 
the instant application and the earlier filed provisional applications fulfill the requirements under 
35 U.S.C. §§ 101 and 1 12 for the nucleotide sequences at a minimum in the instant application, the 
earliest effective filing date for the instant application is at least September 1 1, 1998 for the 
disclosed nucleotide sequences. As Goddard's earliest effective filing date is after 
September 11, 1998, it is not a proper reference under 35 U.S.C. § 102(e). 

In view of the above. Applicants submit that the basis of the rejection may be removed. 

Rejection Under 35 U.S.C> S 103 (a) 

Claim 15 is rejection under 35 U.S.C. § 103 (a) as allegedly being unpatentable over 
Goddard et al., U.S. Published Application 20030092044, effective filing date April 12, 1999, and 
further in view of Akita et al., U.S. Patent No. 5,968,51 1 for reasons of record. Applicants traverse 
this rejection. 

As discussed above. Applicants respectfully submit that Goddard is not a proper 
reference under 35 U.S.C. § 103 (a). Goddard is cited as a reference under 35 U.S.C. § 102 (e) in 
view of the nucleotide sequence disclosed therein. Because the in situ hybridization and PCR data 
disclosed in the instant application and the earlier filed provisional applications fulfill the 
requirements under 35 U.S.C. §§ 101 and 112 for the nucleotide sequences at a minimum, the 
earliest effective filing date for the instant application is at least September 11, 1998 for the 
nucleotide sequences. As Goddard's earliest effective filing date is after September 11, 1998, it is 
not a proper reference under 35 U.S.C. § 103 (a). In the absence of Goddard, Akita does not 
disclose, suggest, or teach each and every element of the claimed composition, and therefore fails to 
establish prima facie obviousness. 

In view of the above. Applicants submit that the basis of the rejection may be removed. 

sd-l 66730 



Application No.: 09/840,795 



11 



Docket No.: 140942000401 



CONCLUSION 



In view of the above, each of the claims in this application is believed to be in immediate 



condition for allowance. Accordingly, the Examiner is respectfully requested to withdraw the 
outstanding rejections of the claims and to pass this application to issue. 



Patent Office determines that an extension and/or other relief is required, applicant petitions for any 
required relief including extensions of time and authorizes the Assistant Commissioner to charge the 
cost of such petitions and/or other fees due in connection with the filing of this document to Deposit 
Account No. 03-1952 referencing docket no. 140942000401 . However, the Assistant 
Commissioner is not authorized to charge the cost of the issue fee to the Deposit Account. 

Dated: December 8, 2003 Respectfully submitted, / „ /? 



In the unlikely event that the transmittal letter is separated from this document and the 




Registration No. : 5 1 ,804 
MORRISON & FOERSTER LLP 
3811 Valley Centre Drive, Suite 500 
San Diego, California 92130 
(858) 720-7955 
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R^^^w Exhibit A 

Paul A. Haynes Proteome analysis: Biological assay or data archive? 

Steven P. Gygi 

B^'^it^A^^K^^^ M ^^^^ review we examine the current state of proteome analysis. There are 

Ruedi Aebersold ^j^^.^^ m2\n issues discussed: why it is necessary to study proteomes; how pro- 

f 1 I teomes can be analyzed with current technology; and how proteome analysis 

Department of Molecular ^^^^ enhance biological research. We conclude that proteome anal- 

Biotechnology. versify of y^j^ essential tool in the understanding of regulated biological systems. 

Washington, £>eatt!e, WA, U5>A Current technology, while still mostly limited to the more abundant proteins, 

enables the use of proteome analysis both to establish databases of proteins 
present, and to perform biological assays involving measurement of multiple 
variables. We believe that the utility of proteome analysis, in future biological 
research will continue to be enhanced by further improvements in analytical 
technology. F 

Contents resolution two-dimensional gel electrophoresis (2-DE), 

J T t d f 1862 identified by their amino acid 

\ ^^x^^^ * " ' V V " 1 o£o sequence. The ease, sensitivity and speed with which gel- 

2 Rauon^e for proteome analysis . . . 1862 separated- proteins can be identified by the use of recently 

2.1 Correlation between mRNA and protein developed mass spectrometric techniques have dramati- 
- - expression levels iboj ^^jj increased the interest in proteome technology. One 

2.2 Proteins are dynamically modified and pro- of the most attractive features of such analyses is that com- 
cesse / * * ' ' *j ■ ' Vi* ' *r ' Pl^x biological systems can potentially be studied in their 

2.3 Proteomes are dynamic and reflect the ^^^^ ^^^^^^ ^^^^ ^ ^ multitude of individual compo- 
state of a biological system . . . ; . 1863 ^^^^ ^ ^^^^ jj ^.^^ J^. 

3 Description and assessment of current pro- ,^ ^^^^ ^^^^^^^^^ relationships betwe n mature 
teome aiialysis technology 1863 gene products in cells. Lvge-scale proteome characteriza- 

3.1 Technical reqmrements of proteome tech- ^j^^ p^^j^^j^ ^een undertaken for a number of dif- 

^ . ■ ■ ■ V — '• 1 1 ferent organisms and cell types. Microbial proteome pro- 

3.2 2D electrophoresis - mass spec rometry: a ^^^^^jj ^ .^^^^^ txampte: Sdccharo- 
common implementeUon of proteome anal- ■ Salmonella entertca {?]. Spiroplasma 
n^'^ ■■■■•j'"Vc ".•■■■L" i"/A'xVo/wi V melliferum [4), Mycobacterium tuberculosis [5], Ochrobac- 

3.3 Protem ident^^^^^^^^^ trum anthropi [6], Haemophilus ir^uenzae [7], Synecho- 

ii'i n^ i r iJrV:' ^^^^ UO], and Dictyostelium discoideum [11]. Proteome 

r^D^xi^Jv' tffA^ projects underway for tissues of more complex organ- 

3.3.3 CK-Mb/M5> — isms include those for: human bladder squamous cell 

3.4 Assessment of 2-DE-MS proteome tech- carcinomas [12], human Uver [13], human plasma [13], 
?T*?^F •I ";:':* : ^ human keratinocytes [12], human fibroblasts [12], mouse 

4 Utility of proteome analysis for biological ^^^^y jjy, and rat serum [14]. In this manuscript we cri- 
A i research V */: it^o tically assess the concept of proteome analysis and the 

4.1 The pro eome as a database 1868 feasibiUty of estabUshing complete proteome 

4.2 The proteome as a biological assay .... 868 ^^^^^ proteome analysis and 

5 Concludmg remarks 870 j^j^j^*.^ research intersect. 

6 References 1870 



1 Introduction 

A proteome has been defined as the protein complement 
expressed by the genome of an organism, or, in multicel- 
lular organisms, as the protein complement expressed by a 
tissue or differentiated cell [1], In the most common im- 
plementation of proteome analysis the proteins extracted 
from the cell or tissue analyzed are separated by high 

CorrespoDdence: Professor Ruedi Aebersold, Department of Molecular 
Biotechnology, University of Washington, Box 357730, Seattle. WA, 
98195, USA (Tfel: +206-685-4235; Fax: +206-685-6392; E-mail: ruedi 
@u.washington.edu) 

Abbreviations: CID, collision-induced dissociation; MS/MS, tandem 
mass spectrometry; SAGE, serial analysis of gene expression 

Keywords: Proteome / IVo -dimensional potyacrylamide gel electro- 
phoresis / Tandem mass spectrometry 



2 Rationale for proteome analfsis 

The dramatic growth in both the number of genome 
projects and the speed with which genome sequences 
are being determined has generated huge amounts of 
sequence information, for some species even complete 
genomic sequences fll5— 17]). The description of the 
state of a biological system by the quantitative measure- 
ment of system components has long been a primary 
objective in molecular biology. With recent technical 
advances including the development of differential dis- 
play-PCR [18], cDNA microarray and DNA chip techno- 
logy [19, 20] and serial analysis of gene expression 
(SAGE) [21, 22], it is now feasible to establish global and 
quantitative mRNA expression maps of cells and tissues, 
in whidi the sequence of all the genes is known, at a 
speed and sensitivity which is not matched by current 
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protein analysis technology. Given the long-standing 
paradigm in biology that DNA synthesizes RNA which 
synthesizes protein, and the ability to rapidly establish 
global, quantitative mRNA expression maps, the ques- 
tions whidi arise are why technically complex proteome 
projects should be undertaken and what specific types of 
information could be expected from proteome projects 
which cannot be obtained from genomic and transcript 
. profiling projects. We see three main reasons for pro- 
teome analysis to become an essential component in the 
corhprehensive analysis of biological systems, (i) Protein 
expression levels are not predictable from the mRNA 
expression levels, (ii) proteins are dynamically modified 
and processed in ways which are not necessarily 
apparent from the gene sequence, and (iii) proteomes 
are dynamic and reflect the state of a biological system. 

2.1 Correlation between mRNA and protein expression 
levels 

Interpretations of quantitative mRNA expression profiles 
frequently implicitly or explicitly assume that for specific 
genes the transcript levels are indicative of the levels of 
protein expression. As part of an ongoing study in our 
laboratory, we have determined the correlatiou of expres- 
sion at the mRNA and protein levels for a population of 
selected genes in the yeast Saccharomyces cerevisiae 
growing at mid-log phase (S. R Gygi et aL, submitted for 
publication). mRNA expression levels were calculated 
from published SAGE frequency tables (22]. Protein 
expression levels were quantified by metabolic radiola- 
beling of the yeast proteins, liquid scintillation counting 
of the protein spots separated by high resolution 2-DE 
and mass spectrometric identification of the protein(s) 
migrating to each spot. The selected 80 samples consti- 
tute a relatively homogeneous group with respect to pre- 
dicted half-life and expression level of the protein pro- 
ducts. Thus far, we have found a general trend but no 
strong correlation between protein and transcript levels 
(Fig. 1). For some genes studied equivalent mRNA trans- 
cript levels translated into protein abundances which 
varied by more than 50-fold. Similarly, equivalent steady- 
state protein expression levels were maintained by trans- 
cript levels varying by as mudi as 40-fold (S. P. Gygi 
et al^ submitted). These results suggests that even for a 
population of genes predicted to be relatively homoge- 
neous with respect to protein half-life and gene expres- 
sion, the protein levels cannot be accurately predicted 
from the level of the corresponding mRNA transcript. 

2.2 Protebs are dynamically modified and processed 

III the mature, biologically active form many proteins are 
post-translationally modified by glycosylation, phosphoi*- 
. ylation, prenylation, acylation, ubiquitination or one or 
more of many other modifications (231 many pro- 
teins are only functional if specifically associated or com- 
plexed with other molecules, including DNA, RNA, pro- 
teins and organic and inorganic cofactors. Frequently, 
modifications are dynamic and reversible and may alter 
the precise three-dimensional structure and the state of 
activity of a protein. Collectively, the state of modifica- 
tion of the proteins which constitute a biological system 
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Figure I. Correlation between mRNA and protein levels in yeast cells. 
For a selected populatioa of 80 genes, protein levels were measured 
by '^-S-radiolabeling and mRNA levels were calculated from publi- 
shed SAGE tables. Inset: expanded view of the low abundance regioa. 
For more experimental details, also see Figs. 5 and 6, (S. P. Gygi et al,, 
submitted). 



are important indicators for the state of the system. The 
type of protein modification and the sites modified at a 
specific cellular state can usually not be determined 
from the gene sequence alone. 

2.3 Proteomes are dynamic and reflect the state of a 
biological system 

A single genome can give rise to many qualitatively and 
quantitatively different proteomes. Specific stages of the 
cell cycle and states of differentiation, responses to 
growth and nutrient conditions, temperature and stress, 
and pathological conditions represent cellular states 
which are characterized by significantly 'differeut pro- 
teomes. The proteome, in principle, also reflects events 
that are under translational and post-translational con- 
trol. It is therefore expected that proteomics will be able 
to provide the most precise and detaUed molecular des- 
cription of the state of a cell or tissue, provided that the 
external conditions defining the state are carefully deter- 
mined. In answer to the question of whether the study 
of proteomes is necessary for the analysis of bio molec- 
ular systems, it is evident that the analysis of mature pro- 
tein products in cells is essential as there are numerous 
levels of control of protein synthesis; degradation, 
processing and modification, which are only apparent by 
direct protein analysis. 



3 Description and assessment of current proteome 
analysis technology 

3.1 Technical requirements of proteome technology 

In biological systems the level of expression as well as 
the states of modification, processing and macro-molec- 
ular association of proteins are controlled and modu- 
lated depending on the state of the system. Comprehen- 
sive analysis of the identity, quantity and state of modifi- 
cation of proteins therefore requires the detection and 
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quantitation of the proteins which constitute the system, 
and analysis of differentially processed forms. There are 
a nuniber of inherent difficulties in protein analysis 
which complicate these tasks. First, proteins cannot be 
amplified. It is possible to produce large amounts of a 
particular protein by over-expression in specific cell sys- 
tems. However, since many proteins are dynamically 
post-translationally modified, they cannot be easily am- 
plified in the form in which they finally function in the 
biological system. It is frequently difficult to purify from 
the native source sufficient amounts of a protein for 
analysis. From a technological point of view this trans- 
lates into the need for high sensitivity analytical tech- 
niques. Second, many proteins are modified and pro- 
cessed post-transiationally. Therefore, in addition to the 
protein identity, the structural basis for differentially 
modified isoforms also needs to be determined. The dis- 
tribution of a constant amount of protein over several 
differentially modified isoforms further reduces the 
amount of each species, available for analysis. The com- 
plexity and dynamics of post-translational protein edit- 
ing thus significantly complicates proteome studies. 
Third, proteins vary dramatically with respect to their 
solubility in commonly used solvents. There are few, if 
any, solvent conditions in whidi all proteins are soluble 
and which are also compatible with protein analysis. This 
makes the development of protein purification methods 
particularly difficult since both protein purification and 
solubility have to be adiieved under the same condi- 
tions. Detergents, in particular sodium dodecyl sulfate 
(SDS), are frequently added to aqueous solvents to 
maintain protein solubility. The compatibility with SDS 
is a big advantage of SDS polyacrylamide gel electro- 
^ phoresis (SDS-PAGE) over other protein separation 
techniques. Thus, SDS-PAGE and two-dimensional gel 
electrophoresis, which also uses SDS and other deter- 
gents, are the most general and preferred methods for 
the purification of small amounts of proteins, provided 
that activity does not necessarily need to be maintained. 
Lastly, the number of proteins in a given cell system is . 
typically in the thousands. Any attempt to identify and 
categorize all of these must use methods whidi are as 
rapid as possible to allow completion of the project 
within a reasonable time frame. Therefore, a successful, 
general proteomics technology requires high sensitivity, 
high throughput, the ability to differentiate differentially 
modified proteins, and the ability to quantitatively dis- 
play and analyze all the proteins present in a sample. 

3.2 2-D electrophoresis — mass spectrometry: a common 
implementation of proteome analysis 

The most common currently used implementation of 
•proteome analysis technology is based on the separation 
of proteins by two-dimensional (lEF/SDS-PAGE) gel 
electrophoresis and their subsequent identification and 
analysis by mass spectrometry (MS) or tandem mass 
spectrometry (MS/MS). In 2-DE, proteins are first separ- 
ated by isoelectric focusing (lEF) and then by SDS- 
PAGE, in the second, perpendicular dimension. Separ- 
ated proteins are visualized at high sensitivity by staining 
or autoradiography, producing two-dimensional arrays of 
proteins. 2-DE gels are, at present, the most commonly 
used means of global display of proteins in complex 



samples. The separation of thousands of proteins has 
been achieved in a single gel [24, 25] and differentially 
modified proteins are frequently separated. Due to the 
compatibility of .2-DE with high concentrations of deter- 
gents, protein denaturants and other additives promoting 
protein solubility, the technique is widely used. 

The second step of this type of proteome analysis is the 
identification and analysis of separated proteins. Individ- 
ual proteins from polyacrylamide gels have traditionally 
been identified using //-terminal sequencing [26, 27), 
internal peptide sequencing [28, 29), immunoblotting or 
comigration with known proteins [30]. The recent dra- 
matic growth of large-scale genomic and expressed 
sequence tag (EST) sequence databases has resulted iiva 
fundamental change in the way proteins are identified |y 
their amino acid sequence. Rather than by the traditioiSl 
methods described above, protein sequences are now fre- 
quently determined by correlating mass spectral or 
tandem mass spectral data of peptides derived from pro- 
teins, with the information contained in sequence data- 
bases [31-33]. 

There are a number of alternative approaches to pro- 
teome analysis currently under development. There is 
considerable interest in developing a proteome analysis 
stragegy which bypasses 2-DE altogether, because it is 
considered a relatively slow and tedious process, and 
because of perceived difficulties in extracting proteins 
from the gel matrix for analysis. However, 2-DE as a 
starting point for proteome analysis has many advan- 
tages compared to other tediniques available today. The 
most significant strengths of the 2-DE-MS approach 
include the relatively uniform behavior of proteins in 
gels, the abiUty to quantify spots and the high resolution 
and simultaneous display of hundreds to thousands of 
proteins within a reasonable time frame. 

A schematic diagram of a typical procedure of the identi- 
fication of gel-separated proteins is shown in Fig. 2. Pro- 
tein spots detected in the gel are enzymatically or chemi- 
cally fragmented and the peptide fragments are isolated 
for analysis, as already indicated, most frequently by MS 
or MS /MS. There are numerous protocols for the gener- 
ation of peptide fragments from gel-separated proteins. 
They can be grouped into two categories, digestion in 
the gel slice (28, 34] or digestion after electrotransfer out 
of the gel onto a suitable membrane ([29, 35-37] and 
reviewed m [38]). In most instances either tedmique is 
applicable and yields good results. The analysis of MS or 
MS/MS data is an important step in the whole process 
because MS instruments can generate an enormous 
amount of information which cannot easily be managed 
manually. Recently, a number of groups have developed 
software systemis dedicated to the use of peptide MS 
and MS/MS spectra for the identification of proteins. 
Proteins are identified by correlating the information 
contained in the MS spectra of protein digests or 
MS /MS spectra of individual peptides with data con- 
tained in DNA or protein sequence databases. 

The systems we are currently using in our laboratory are 
based on the separation of the peptides contained in pro- 
tein digests by narrow bore or capillary liquid chromatog- 
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Figure 2, Schematic diagram of a procedure for identification of gel- 
separated proteins. Peptides can either be separated by a technique 
such as LC or C£, or infused as a mixture and sorted in the MS. Data- 
base searching can either be performed on peptide masses from an 
MS spectrum, peptide fragment rnasses from CID spectra of peptides, 
or a combination of both. 



raphy [39, 40] or capillary electrophoresis [41], the anal- 
ysis of the separated peptides by electrospray ioniza- 
tion (ESI) MS /MS, and the correlation of the generated 
peptide spectra with sequence databases using the 
. SEQUEST program developed at the University of Wash- 
ington [32, 33]. The system automatically performs the 
following operations: a particular peptide ion character- 
ized by its mass-to-diarge ratio is selected in the MS out 
of all the peptide ions present in the system at a parti- 
cular time; the selected peptide ion is collided in a colli- 
sion cell with argon (collision-induced dissociation, 
CID) and the masses of the resulting fragment ions are 
determined in the second sector of the tandem MS; this 
experimentally determined CID spectrum is then corre- 
lated with the CID spectra predicted from all the pep- 
tides in a sequence database which have essentially the 
same mass as the peptide selected for CID; this correla- 
tion matches the isolated peptide with a sequence seg- 
ment in a database and thus identifies the protein from 
whidi the peptide was derived. There are a number of 
alternative programs which use peptide CID spectra for 
protein identification, but we use the SEQUEST system 
because it is currently the most highly automated pro- 
gram and has proven to be successful, versatile and 
robust. 



required. As an approximate guideline, for samples con- 
taining tens of picomoles of peptides, LC-MS/MS is 
most appropriate; for samples containing low picomole 
amounts to high femtomole amounts we use capillary 
LC-MS/MS; and for samples containing femtomoles or 
less, CE-MS/MS is the method of choice. 

3.3.1 LC-MS/MS 

The coupling of an MS to an HPLC system using a 
0.5 mm diameter or bigger reverse phase (RP) column 
has been described in detail [42], This system has several 
advantages if a large number of samples are to be ana- 
lyzed and all are available in sufficient quantity. The 
LC-MS and database searching program can be run in a 
fully automated mode using an autosampler, thus maxi- 
mizing sample throughput and minimizing the need for 
operator interference. The relatively large column is 
tolerant of high levels of impurities from either gel prep- 
aration or sample matrix. Lastly, if configured with a 
flpw-splitter and micro-sprayer [40], analyses can be per- 
formed on a small fraction of the sample (less than 5 %) 
while the remainder of the sample is recovered in very 
pure solvents. This latter feature is particularly useful 
when an orthogonal technique is also used to analyze 
peptide fractions, such as scintillation of an introduced 
radio label, and this data can be correlated with peptides 
identified by CID spectra. 

3.3^ CapQlary LC-MS 

An increase of sensitivity of approximately tenfold can be 
achieved by using a capillary LC system with a 100 ^m ID 
column rather than a 0.5 mm ID column as referred to 
above. Since very low flow rates are required for such 
columns, most reports have used a precolumn flow split- 
ting system for producing solvent gradients. We have 
recently desribed the design and construction of a novel 
gradient mixing system which enables . the formation 
of reproducible gradients at very low flow rates (low 
nL/min) without the need for flow splitting (A. Ducret 
et al.y submitted for publication). Using this capillary 
LC-MS/MS system we were able to identify gel-separat- 
ed proteins if low picomole to high femtomole amounts 
were loaded onto the gel [40], This system is as yet not 
automated and, like all capillary LC systems, is prone to 
blockage of the columns by microparticulates when ana- 
lyzing gel-separated proteins. 



33 Protein identification by LC-MS/MS, capillary 
LC-MS/MS and CE-MS/MS 

It has been demonstrated repeatedly that MS has a very 
high intrinsic sensitivity. For the routine analysis of gel- 
separated proteins at high sensitivity, the most signif- 
icant challenge is the handling of smalt amounts of 
sample. The crux of the problem is the extraction and 
transferal of peptide mixtures generated by the digestion 
of low nanogram amounts of protein, from gels into the 
MS/MS system without significant loss of sample or 
introduction of unwanted contaminants. We employ 
three different systems for introducing gel-purified sam- 
ples into an MS, depending on the level of sensitivity 



3,33 CE-MS/MS 

The highest level of sensitivity for analyzing gel-sep- 
arated proteins can be achieved by using capillary elec- 
trophoresis — mass spectrometry (CE-MS). We have de- 
scribed in the past a solid-phase extraction capillary elec- 
trophoresis (SPE-CE) system which was used with triple 
quadrupole and ion trap ESI-MS/MS systems for the 
identification of proteins at the low femtomole to sub- 
femtomole sensitivity level [43, 44]. While this system is 
highly sensitive, its operation is labor-intensive and its 
operation has not been automated. In order to devise an 
analytical system with both the sensitivity of a CE and 
the level of automation of LC, we have constructed 
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Electrode 




Figure 5. Schematic illustration of a 
microfabricated analytical system for CE, 
consisting of a micromachined device, 
coated capillary electroosmotic pump, 
and microelectrospray interface. The 
dimensions of the channels and reservoir 
are as indicated in the text. The channels 
on the device were graphically enhanced 
to make them more visible. Reproduced 
from. [451, with permission. 



microfabricated devices for the introduction of samples 
into ESI-MS for high-sensitivity peptide analysis. 

The basic device is a piece of glass into which channels 
of lQ-30 \im in depth and 50-70 nm in diameter are 
etched by using photolithography/etdiing techniques 
similar to the ones used in the semiconductor industry. 
(A simple device is shown in Fig. 3). The channels are 
connected to an external high voltage power supply (45]. 
Samples are manipulated on the device and off the 
device to the MS by applying different potentials to the 
reservoirs. This creates a solvent flow by electroosmotic 
pumping which can be redirected by changing the posi- 
tion of the electrode. Therefore, without the need for 
valves or gates and without any external pumping, the 
flow can be redirected by simply switching the position 
of the electrodes on the device/The direction and rate of 
the flow can be modulated by the size and the polarity 
of the electric field applied and also by the charge sUte 
of the surface. 

The type of data generated by the system is illustrated in 
Fig. 4, which shows the mass spectrum of a peptide sample 
representing the tryptic digest of carbonic anhydrase at 
290 fmol/jtL. Each numbered peak indicates a peptide suc- 
cessfully identified as being derived from carbonic an- 



hydrase. Some of the unassigned signals may be chemical 
or peptide contaminants. The MS is programmed to auto- 
matically select each peak and subject the peptide to CID. 
The resulting CID spectra are then used to identify the 
protein by correlation with sequence databases. Therefore, 
this system allows us to concurrently apply a number of 
protein digests onto the device, to sequentially mobilize 
the samples, to automatically generate CID spectra of 
selected peptide ions and to search sequence databases 
for protein identification. These steps are performed auto- 
matically without the need for user input and proteins can 
be identified at very low femtomole level sensitivity at a 
rate of approximately one protein per 15 min. 

3,4 Assessment of 2-DE-MS proteome technology 

Using a combination of the analytical techniques de- 
scribed above we have identified the 80 protein spots 
indicated in Fig. 5. The protein pattern was generated by 
separating a total of 40 microgram of protein contained 
in a total cell lysate of the yeast strain YPH499 by high 
resolution 2-DE and silver staining of the separated pro- 
teins. To estimate how far this type of proteome analysis 
can penetrate towards the identification of low abun- 
dance proteins, we have calculated the codon bias of the 
genes encoding the respective proteins. Codon bias is a 
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Figure 4. MS spectrum of a tryptic digest 
of carbonic anhydrase using the microfa- 
bricated system shown in Fig. 3. 290 
fmol/iiL of parbonic anhydrase tryptic 
digest was infused into a Finnigan LCQ 
ion Uap MS. Each peak was selected for 
CID, and those which were identified as 
containing peptides derived from caf-= 
bonic anhydrase are numbered. Repro- 
duced from (451, permission. ' 
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Figure 5. 2-DE separation of a lysate of yeast cells, with identified proteins highlighted. The first dimension of separation was an IPG from 
pH 3-10, and the second dimension was a 10%T SDS-PAGE gel. Proteins were visualized by silver staining. Further details of experimental 
procedures are included in S. P. Gygi «r a!, (submitted). 
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calculated measure of the degree of redundancy of trip- 
let DNA codons used to produce eadi amino acid in a 
particular gene sequence. It has been shown to be a 
useful indicator of the level of the protein product of a 
particular gene sequence present in a cell [46], The gen- 
eral rule which applies is that the higher the value of the 
codon bias calculated for a gene, the more abundant the 
protein product of that gene becomes. The calculated 
codon bias values corresponding to the proteins identi- 
fied in Fig. 5 are shown in Fig. 6b. Nearly all of the pro- 
teins identified (> 95%) have codon bias values of > 0.2, 
indicating they are highly abundant in cells. In contrast, 
codon bias values calculated for the entire yeast genome 
(Fig. 6a) show that the majority of proteins present in 
the proteome have a codon bias of < 0.2 and are thus of 
low abundance. 

This finding is of considerable importance in our assess- 
ment of the current status of proteome analysis technol- 
ogy. It is clear that even using highly sensitive analytical 
techniques, we are only able to visualize and identify the 



more abundant proteins. Since many important regula- 
tory proteins are present only at low abundance, these 
would not be amenable to analysis using such tech- 
niques. This situation would be exacerbated in the anal- 
ysis of proteomes containing many more proteins than 
the approximately 6000 gene products' present in yeast 
cells [16]. In the analysis of, for example, the proteome 
of any human cellsi there are potentially 50000-100000 
gene products [47]. Inherent limitations on the amount 
of protein that can be loaded on 2-DE, and the number 
of components that can be resolved, indicate that only 
the most highly ' abundant fraction of the many gene 
products could be successfully analyzed. One approach 
that has been employed to circumvent these limitations 
is the use of very narrow range immobilized pH gradient 
strips for the first-dimension separation of 2-DE [48]. 
Since only those proteins which focus within the narrow 
range will enter the second dimension of separation, a 
much higher sample loading within the desired range is 
possible. This, in turn, can lead to the visualization and 
identification of less abundant proteins. 
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Figure 6. Calculated codon bias values for yeast proteins. (A) Distribu- 
tion of calculated values for the entire yeast proteome. (B) Distribu- 
tion of calculated values for the subset of 80 identified proteins also 
shown in Figs. 1 and 5. Further details of experimental procedures are 
included in S. P, Gygi et ai. (submitted). 



4 Utility of proteome analysis for biological 
research 

For the success of proteornics as a. mainstream approach 
to the analysis of biological systems it is essential to 
define how proteome analysis and biological research 
projects intersect. Without a clear plan for the implemen- 
tation of proteome-type approaches into biological re- 
search projects the full impact of the tedmology can not 
be realized. The literature indicates that proteome anal- 
ysis is used both as a database/data ardiive, and as a bio- 
logical assay or biological research tool. 

4.1 Hie proteome as a database 

The use of proteornics as a database or data archive 
essentially entails an attempt to identify all the proteins 
in a cell or species and to annotate each protein with the 
known biological information that is relevant for each 
protein. The level of annotation can, of course, be exten- 
sive. The most common implementation of this idea is 
the separation of proteins. by high resolution 2-DE, the 
identification of each detected protein spot and ' the 
annotation of the protein spots in a 2-DE gel database 
format. This approach is complicated by the fact that it is 
difficult to precisely define a proteome and to decide 
which proteome should be represented in the database. 
In contrast to the genome of a species, which is essen- 
tially static, the proteome is highly dynamic. Processes 
sudi as differentiation, cell activation and disease can all 
significantly change the proteome of a species. This is 
illustrated in Fig. 7. The figure shows two high-resolu- 
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tion 2-DE maps of proteins isolated from rat serum. 
Fig. 7A is from the serum of normal rats, while Fig. 7B 
is from the serum of rats in acute-phase serum after 
prior treatment with an inflammation-causing agent [49]. 
It is obvious that the protein patterns are significantly 
different in several areas, raising the question of exactly 
which proteome is being described. 

Therefore, a comprehensive proteome database of a spe- 
cies or cell type needs to contain all of the parameters 
which describe the state and the type of the cells from 
which the proteins were extracted as well as the software 
tools to search the database with queries which reflect 
the dynamics of biological systems. A comprehensive 
proteome database should be capable of quantitatively, 
describing the fate of each protein if specific systenf 
and pathways are activated in the cell. Specifically, the 
quantity, the degree of modification, the subcellular loca- 
tion and the nature of molecules specifically interacting 
with a protein as well as the rate of change of these 
variables should be described. Using these admittedly 
stringent criteria, there is currently no comlete proteome 
database. A number of such databases are, however, in 
the process of being constructed. Hie most advanced 
among them, in our opinion, are the yeast protein data- 
base YPD [50] (accessible at http://www.ypd.com) and 
the human 2D-PAGE databases of the Danish Centre 
for Human Genome Research [12] (accessible at http:// 
biobase.dk/cgi-bin/celis). While neither can be con- 
sidered complete as not all of the potential gene pro- 
ducts are identified, both contain extensive annotation 
of supplemental information for many of the spots 
which are positively identified in reference samples, 

4.2 The proteome as a biological assay 

The use of proteome analysis as a biological assay or 
research tool represents an alternative approach to inte- 
grating biology with proteornics. To investigate the state 
of a system, samples are subjected to a specific proceess 
that allows the quantitative or qualitative measurement 
of some of the variables which describe the system. In 
typical biodiemical assays one variable (eg., enzyme 
activity) of a single component (eg., a particular en- 
zyme) is measured. Using proteornics as an assay; mul- 
tiple variables (eg., expression level, rate of synth sis, 
phosphorylation state, etc.) are measured concurrently 
on many (ideally all) of the proteins in a sample. The 
use of proteomics as an assay is a less far-readiing prop- 
osition than the construction of a comprehensive pro- 
teome database. It does, however, represent a pragmatic 
approach which can be adapted to investigate specific 
systems and pathways, as long as the interpretation of 
the results takes into account that with current tedmol- 
ogy not all of the variables which describe the system 
can be observed (see Section 3.4). 

A common implementation of proteome analysis as a 
biological assay is when a 2-DE protein pattern gener- . 
ated from the analysis of an experimental sample is 
compared to an array of reference patterns representing 
different states of the system, under investigation. The 
state of the experimental system at the time the sample 
was generated is therefore determined by the quantita- 
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tive comparative analysis of hundreds to a few thousand 
proteins. Comparative analysis of the 2-DE patterns fur- 
thermore highlights quantitative and qualitative differ- 
ences in the protein profiles which correlate with the 
state of the system. For this type of analysis it is not 
essential that all the proteins are identified or even visu- 
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alized, although the results become more informative as 
more proteins are compared. It is obvious, however, that 
the possibility to identify any profein deemed character- 
istic for a particular state dramatically enhances this 
approach by opening up new avenues for experimenta- 
tion. 
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Figure 7. High rcsolutioa 2-DE map of proteins isolated from rat serum with or without prior exposure to an laflam- 
mation-causing agent. (A) normal rat serum, (B) acute-phase scrum from rats which had previously been exposed to 
an inflammation-causing agent. The first dimension of separation is an IPG from pH 4—10, and the second dimen- 
sion is a 7.5-I7.5%T gradient SDS-PAOE gel. Proteins were visualized by staining with amido black. Further details 
of experimental procedures arc included in [14, 49]. 
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Proteome analysis as a biological assay has been success- 
fully used in the field of toxicology, to characterize 
disease states or to study difTerential activation of cells. 
The approach is limited, of course, by the fact that only 
the visible protein spots are included in the assayi and it 
is well known that a substantial but far from complete 
fraction of cellular proteins are detected if a total cell 
lysate is separated by 2-DE. Proteins may not be 
detected in 2-DE gels because they are not abundant 
enough to be visualized by the detection method used, 
because they do not migrate within the . boundaries (size, 
p7) resolved by the gel, because they are not soluble 
under the conditions used, or for other reasons. 

A different way to use proteome analysis as a biological 
assay to define the state of a biological system is to take 
advantage of the wealth of information contained in 
2-DE protein patterns. 2-DE is referred to as two-dimen- 
sional because of the electrbphoretic mobility and the 
isoelectric points whidi define the position of each pro- 
tein in a 2-DE pattern. In addition to the two dimen- 
sions used to generate the protein patterns, a number of 
additional data dimensions are contained in the protein 
patterns. Some of these dimensions such as protein 
expression level, phosphorylation state, subcellular loca- 
tion, association with other proteins, rate of synthesis or 
degradation indicate the activity state of a protein or a 
biological system. Comparative analysis of 2-DE protein 
patterns representing different states is therefore ideally 
suited for the detection, identification and analysis of 
suitable markers. Once again it must be emphasized that 
in this type of experiment only a fraction of the cellular 
proteins is analyzed. Since many regulatory proteins are 
of low abundance, this limitation is a concern, particu- 
larly in cases in which regulatory pathways are being 
investigated. 

5 Concluding remarks 

In this report we have addressed three main issues 
related to proteome analysis. Pint, we have discussed 
the rationale for studying proteomes. Second, we have 
assessed the technical feasibility of analyzing proteomes 
and described current proteome technology, and .third, 
we have analyzed the utiUty of proteome analysis for bio- 
logical research. It is apparent that proteome analysis is 
an essential tool in the analysis of biological systems. 
The multi-level control of protein synthesis and degrada- 
tion in cells means that only the direct analysis of 
mature protein products can reveal their correct identi- 
ties, their relevant state of modification and/or associa- 
tion and their amounts. Recently developed methods 
have enabled the identification of proteins at ever- 
increasing sensitivity levels and at a high level of auto- 
mation of the analytical processes. A number of tedi- 
nical challenges, however, remain. While it is currently 
possible to identify essentially any protein spots that can 
be visualized by common staining methods, it. is ap- 
parent that without prior enrichment only a relatively 
small and highly selected population of long-lived, 
highly expressed proteins is observed. There are many 
more protems m a given cell which are not visualized by 
such methods. Frequently it is the low abundance pro- 
teins that execute key regulatory functions. 



We have outlined the two principal ways proteome anal- 
ysis is currently being used to intersect with biological 
research projects: the proteome as a database or data 
archive and proteome analysis as a biological assay. Both 
approaches have in common that at present they are con- 
ceptually and technically limited. Current proteome data- 
bases typically are limited to one cell type and one state 
of a cell and therefore do not account for the dynamics 
of biological systems. The use of proteome analysis as a 
biological assay can provide a wealth of information, but 
it is limited to the proteins detected and is therefore not 
truly proteome-wide. These limitations in proteomics are 
to a large extent a reflection of the fact that proteins in 
their fully processed form cannot easily be amplified and 
are therefore difficult to isolate in amounts sufficientJbr 
analysis or experimentation. The fact that to datefno 
complete proteome has been described further attesti to 
these difficulties. With continued rapid progress in pro- 
tein analysis technology, however, we anticipate that the 
goal of complete proteome analysis will eventually 
become attainable. 
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We have determined the relationship between mRNA and protein expression levels for selected genes 
expressed in the yeast Saccharomyces cerevisiae growing at mid- log phase. The proteins contained in total yeast 
cell lysate were separated by high-resolution two-dimensional (2D) gel electrophoresis. Over 150 protein spots 
were excised and identified by capillary liquid chromatography-tandem mass spectrometry (LC-MS/MS). 
Protein spots were quantified by metabolic labeling and scintillation counting. Corresponding mRNA levels 
were calculated from serial analysis of gene expression (SAGE) frequency tables (V. E. Velculescu, L. Zhang, 
W. Zhou, J. Vogelstein, M. A, Basrai, D. E. Bassett, Jr., P. Hieter, B. Vogelstein, and K. W. Kinzler, Cell 
88:243-251, 1997). We found that the correlation between mRNA and protein levels was insufficient to predict 
protein expression levels from quantitative mRNA data. Indeed, for some genes, while the mRNA levels were 
of the same value the protein levels varied by more than 20-foid. Conversely, invariant steady-state levels of 
certain proteins were observed with respective mRNA transcript levels that varied by as much as 30-fold. 
Another interesting observation is that codon bias is not a predictor of either protein or mRNA levels. Our 
results clearly delineate the technical boundaries of current approaches for quantitative analysis of protein 
expression and reveal that simple deduction from mRNA transcript analysis is insufficient 
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EXHIBIT B : 



The description of the state of a biological system by the 
quantitative measurement of the system constituents is an es- 
sential but largely unexplored area of biology. With recent 
technical advances including the development of differential 
display-PCR (21), of cDNA microarray and DNA chip tech- 
nology (20, 27), and of serial analysis of gene expression 
(SAGE) (34, 35), it is now feasible to establish global and 
quantitative mRNA expression profiles of cells and tissues in 
species for which the sequence of all the genes is known. 
However, there is emerging evidence which suggests that 
mRNA expression patterns are necessary but are by them- 
selves insufficient for the quantitative description of biological 
systems. This evidence includes discoveries of posttranscrip- 
tional mechanisms controlling the protein translation rate (15), 
the half-lives of specific proteins or mRNAs (33), and the 
intracellular location and molecular association of the protein 
products of expressed genes (32). 

Proteome analysis, defined as the analysis of the protein 
complement expressed by a genome (26), has been suggested 
as an approach to the quantitative description of the state of a 
biological system by the quantitative analysis of protein expres- 
sion profiles (36). Proteome analysis is conceptually attractive 
because of its potential to determine properties of biological 
systems that are not apparent by DNA or mRNA sequence 
analysis alone. Such properties include the quantity of protein 
expression, the subcellular location, the state of modification, 
and the association with ligands, as well as the rate of change 
with time of such properties. In contrast to the genomes of a 
number of microorganisms (for a review, see reference 11) and 
the transcriptome oi Saccharomyces cerevisiae (35), which have 
been entirely determined, no proteome map has been com- 
pleted to date. 

The most common implementation of proteome analysis is 
the combination of two-dimensional gel electrophoresis (2DE) 
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(isoelectric focusing-sodium dodecyl sulfate [SDS]-polyaciyl- 
amide gel electrophoresis) for the separation and quantitation 
of proteins with analytical methods for their identification. 
2DE permits the separation, visualization, and quantitation of 
thousands of proteins reproducibly on a single gel (18, 24). By 
itself, 2DE is strictly a descriptive technique. The combination 
of 2DE with protein analytical techniques has added the pos- 
sibility of establishing the identities of separated proteins (1, 2) 
and thus, in combination with quantitative mRNA analysis, of 
correlating quantitative protein and mRNA expression mea- 
surements of selected genes. 

The recent introduction of mass spectrometric protein anal- 
ysis techniques has dramatically enhanced the throughput and 
sensitivity of protein identification to a level which now permits 
the large-scale analysis of proteins separated by 2DE. The 
techniques have reached a level of sensitivity that permits the 
identification of essentially any protein that is detectable in the 
gels by conventional protein staining (9, 29). Current protein 
analytical technology is based on the mass spectrometric gen- 
eration of peptide fragment patterns that are idiotypic for the 
sequence of a protein. Protein identity is established by corre- 
lating such fragment patterns with sequence databases (10, 22, 
37). Sophisticated computer software (8) has automated the 
entire process such that proteins are routinely identified with 
no human interpretation of peptide fragment patterns. 

In this study, we have analyzed the mRNA and protein levels 
of a group of genes expressed in exponentially growing cells of 
the yeast 5. cerevisiae. Protein expression levels were quantified 
by metabolic labeling of the yeast proteins to a steady state, 
followed by 2DE and liquid scintillation counting of the se- 
lected, separated protein species. Separated proteins were 
identified by in-gel tryptic digestion of spots with subsequent 
analysis by microspray liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) and sequence database searching. 
The corresponding mRNA transcript levels were calculated 
from SAGE frequency tables (35). 

This study, for the first time, explores a quantitative com- 
parison of mRNA transcript and protein expression levels for 
a relatively large number of genes expressed in the same met- 
abolic state. The resultant correlation is insufficient for predic- 
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FIG. 1. Schematic illustration of proteome analysis by 2DE and mass spectrometry. In part I, proteins are separated by 2DE, stained spots are excised and subjected 
to in-gel digestion with tiypsin, and the resulting peptides are separated by on-line capillary high-performance liquid chromatography. In part II, a peptide is shown 
eluting from the column in part I. The peptide is ionized by electrospray ionization and enters the mass spectrometer. The mass of the ionized peptide is detected, and 
the first quadrupole mass filter allows only the specific mass-to-charge ratio of the selected peptide ion to pass into the collision celK In the collision cell, the energized, 
ionized peptides collide with neutral argon gas molecules. Fragmentation of the peptide is essentially random but occurs mainly at the peptide bonds, resulting in smaller 
peptides of differing lengths (masses). These peptide fragments are detected as a tandem mass (MS/MS) spectrum in the third quadrupole mass filter where two ion 
series are recorded simultaneously, one each from sequencing inward from the N and C termini of the peptide, respectively. In part III, the MS/MS spectrum from the 
selected, ionized peptide is compared to predicted tandem mass spectra computer generated from a sequence database. Provided that the peptide sequence exists in 
the database, the peptide and, by association, the protein from which the peptide was derived can be identified. Unambiguous protein identification is attained in a single 
analysis because multiple peptides are identified as being derived from the same protein. 



lion of protein levels from mRNA transcript levels. We have 
also compared the relative amounts of protein and mRNA 
with the respective codon bias values for the corresponding 
genes. This comparison indicates that codon bias by itself is 
insufficient to accurately predict either the mRNA or the pro- 
tein expression levels of a gene. In addition, the results dem- 
onstrate that only highly expressed proteins are detectable by 
2DE separation of total cell lysates and that therefore the 
construction of complete proteome maps with current technol- 
ogy will be very challenging, irrespective of the type of organ- 
ism. 

MATERIALS AND METHODS 

Yeast strain and growth conditions. The source of protein and message tran- 
scripts for all experiments was YPH499 (A/zlTa uTa3-52 fys2-80I ade2-Wl 
letil'M his3'A200 trpl-A63) (30). Logarithmically growing cells were obtained by 
growing yeast cells to early log phase (3 X 10** cells/ml) in YPD rich medium 
(YPD supplemented with 6 mM uracil, 4.8 mM adenine, and 24 mM tryptophan) 
at 30*C (35), Metabolic labeling of protein was accomplished in YPD medium 



exactly as described elsewhere (4) with the exception that 1 ml of cells was 
labeled with 3 mCi to offset methionine present in YPD medium. Protein was 
harvested as described by Garrels and coworkers (12). Harvested protein was 
lyophilized, resuspended in isoelectric focusing gel rehydration solution, and 
stored at -SO'C. 

2DE. Soluble proteins were run in the first dimension by using a commercial 
flatbed electrophoresis system (Multiphor II; Pharmacia Biotech). Immobilized 
polyacrylamide gel (IPG) diy strips with nonlinear pH 3.0 to 10.0 gradients 
(Amersham- Pharmacia Biotech) were used for the first-dimension separation. 
Forty micrograms of protein from whole-cell lysates was mixed with IPG strip 
rehydration buffer (8 M urea, 2% Nonidet P-40, 10 mM dithiothreitol), and 250 
to 380 nl of solution was added to individual lanes of an IPG strip rehydration 
tray (Amersham- Pharmacia Biotech). ITie strips were allowed to rehydrate at 
room temperature for 1 h. The samples were run at 300 V-10 mA-5 W for 2 h, 
then ramped to 3,500 V-IO mA-5 W over a period of 3 h, and then kept at 3^00 
V-10 mA-5 W for 15 to 19 h. At the end of the first-dimension run (60 to 70 kV • 
h), the IPG strips were reequilibrated for 8 rain in 2% (wtA'ol) dithiothreitol in 
2% (wt/vol) SDS-6 M urea-30% (wt/vol) glycerol-0.05 M Tris HCI (pH 6.8) and 
for 4 m in in 2.5% iodoacetamide in 2% (wt/vol) SDS-6 M urea-30% (wt^ol) 
glycerol-0.05 M Tris HCI (pH 6.8). Following reequilibration, the strips were 
transferred and apposed to 10% polyacrylamide second-dimension gels, Poly- 
acrylamide gels were poured in a casting stand with 10% aCTylamide-2.67% 
piperazine diacrylamide-0.375 M Tris base-HCI (pH 8.8)-0.1% (wt/vol) SDS-0.05% 
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FIG. 2. 2D silver-stained gel of the proteins in yeast total cell lysate. Proteins were separated in the first dimension (horizontal) by isoelectric focusing and then in 
the second dimension (vertical) by molecular weight sieving. Protein spots (156) were chosen to include the entire range of molecular weights, isoelectric focusing points, 
and staining intensities. Spots were excised, and the corresponding protein was identified by mass spectrometry and database searching. The spots are labeled on the 
gel and correspond to the data presented in Table I. Molecular weights are given in thousands. 



(wt/vol) ammonium persulfate-0.05% TEMED (Af^A' A'-tetramethylethyl- 
enediamine) in Milli-Q water. The apparatus used to run second-dimension gels 
was a noncommercial apparatus from Oxford Glycosciences, Inc. Once the IPG 
strips were apposed to the second-dimension gels, they were immediately run at 
50 mA (consiant>-500 V-85 W for 20 min, followed by 200 mA (constant)-500 
V-85 W until the buffer front line was 10 to 15 mm from the bottom of the gel. 
Gels were removed and silver stained according to the procedure of Shevchenko 
et al. (29). 

Protein identification. Gels were exposed to X-ray film overnight, and then the 
silver staining and film were used to excise 156 spots of varying intensities, 
molecular weights, and isoelectric focusing points. In order to increase the 
detection limit by mass spectrometry, spots were cut out and pooled from up to 
four identical cold, silver-stained gels. In-gel tryptic digests of pooled spots were 
performed as described previously (29). Tryptic peptides were analyzed by mi- 
crocapillaiy LC-MS with automated switching to MS/MS mo<le for peptide 
fragmentation. Spectra were searched against the composite OWL protein se- 
quence database (version 30.2; 250,514 protein sequences) (24a) by using the 
computer program Sequest (8), which matches theoretical and acquired tandem 
mass spectra. A protein match was determined by comparing the number of 
peptides identified and their respective cross-correlation scores. All protein 
identifications were verified by comparison with theoretical molecular weights 
and isoelectric points. 



mRNA quantitation. Velculescu and coworkers have previously generated 
frequency tables for yeast mRNA transcripts from the same strain grown under 
the same stated conditions as described herein (35). The SAGE technology is 
based on two main principles. First, a short sequence tag (15 bp) that contains 
sufficient information uniquely to identify a transcript is generated. A single tag 
is usually generated from each mRNA transcript in the cell which corresponds to 
15 bp at the 3 '-most cutting site for NlalU. Second, many transcript tags can be 
concatenated into a single molecule and then sequenced, revealing the identity of 
multiple tags simultaneously. Over 20,000 transcripts were sequenced from yeast 
strain YPH499 growing at mid-log phase on glucose. Assuming the previously 
derived estimate of 15,000 mRNA molecules per cell (16), this would represent 
a 1.3-fold coverage even for mRNA molecules present at a single copy per cell 
and would provide a 72% probability of detecting such transcripts. Computer 
software which took for input the gene detected, examined the nucleotide se- 
quence, and performed the calculation as described by Velculescu and coworkers 
(35) was written. In practice, we found that for 21 of 128 (16%) genes examined 
viable mRNA levels from SAGE data could not be calculated. This was because 
(i) no CATG site was found in the open reading frame (ORF), (ii) a CATC site 
was found but the corresponding 10-bp putative SAGE tag was not found in the 
frequency tables, or (iii) identical putative SAGE tags were present for multiple 
genes (e.g., TDH2_YEAST and TDH3_YEAST). 
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TABLE 1. Expressed genes identified from 2D gel in Fig. 2 TABLE 1 — Continued 



Mol wt 


pl 


Spot no. 


YPD gene 
name" 


Protein 
abundance 
(ItP copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


Mol wt 


Pl 


Spot no. 


YPD gene 
name" 


Protein 
abundance 
(ICP copies/ 
ceil) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


17,259 


6.75 


133 


CPRl 


15.2 


61.7 


0.769 


39,477 


5.58 


86 


FBAl 


17.8 


183.6 


0.935 


18,702 


4.80 


83 


EGD2 


20.1 


5.2 


0.724 


39,477 


5.58 


87 


FBAl 


427.2 


183.6 


0.935 


18,726 


4.44 


147 


YKL056C 


61.2 


88.4 


0.831 


39,540 


6.50 


150 


HOM2 


60.3 


4.5 


0.592 


18,978 


5.95 


135 


YER067W 


3.7 


6.7 


0.118 


39,561 


6.12 


156 


PSAl 


96.4 


27.5 


0.718 


19,108 


5.04 


130 


YLR109W 


94.4 


9.7 


0.680 


41,158 


6.01 


49 


YNL134C 


14.9 


1.5 


0.316 


19,681 


9.08 


136 


ATP7 


11.0 


NA*^ 


0.246 


41,623 


7.18 


58 


BAT2 


19.0 


8.9 


0.250 


20,505 


6.07 


111 


GUKl 


16,5 


3.7 


0.422 


41,728 


7.29 


110 


ERG 10 


24.1 


4.5 


0.543 


21,444 


5.25 


148 


SARI 


5.4 


10.4 


0.455 


41,900 


5.42 


74 


TOM40 


22.3 


2.2 


0.375 


21,583 


4.98 


95 


TSAl 


110.6 


40.1 


0.845 


42,402 


6.29 


45 


CYS3 


6.7 


8.9 


0.621 


22,602 


4.30 


80 


EFBl 


66.1 


23.8 


0.875 


42,883 


5.63 


67 


DYSl 


- 15.8 


5.2 


0.526 


23,079 


6.29 


112 


SOD2 


12.6 


2,2 


0,351 


43,409 


6.31 


107 


SERl 


10.5 


1.5 


0.292 


23,743 


5.44 


137 


HSP26 


NA** 


0.7 


0,434 


43,421 


5.59 


91 


ERG6 


2.2 


14.1 


0.408 


24,033 


5.97 


96 


ADKl 


17.4 


16.4 


0.656 


44,174 


7.32 


56 


YBR025C 


13,1 


6.0 


0.684 


24,058 


4.43 


143 


YKL117W 


29.2 


10.4 


0.339 


44,682 


4.99 


72 


TIFl 


2.9 


39.4 


0.834 


24,353 


6.30 


140 


TFSl 


8.1 


0.7 


0.146 


44,707 


7.77 


108 


PGKl 


23.7 


165.7 


0.897 


24,662 


5.85 


99 


URA5 


25,4 


6.0 


0.359 


44,707 


7.77 


109 


PGKl 


315.2 


165.7 


0.897 


24,808 


6.33 


97 


GSPl 


26.3 


5,2 


0.735 


46,080 


6.72 


30 


CAR2 


15.4 


NA'^ 


0.495 


24,908 


8.73 


122 


RPS5 


18.6 


NA*^ 


0.899 


46,383 


8,52 


53 


IDPl 


7.7 


0.7 


0,436 


25,081 


4,65 


81 


MRP8 


9.3 


NA'' 


0,241 


46,553 


5.98 


47 


IDP2 


32.4 


NA*" 


0.197 


25,960 


6.06 


116 


RPEl 


5,8 


0.7 


0.372 


46,679 


6,39 


50 


ENOl 


35.4 


0.7 


0.930 


26,378 


9.55 


127 


RPS3 


96.8 


NA'' 


0.863 


46,679 


6.39 


51 


ENOl 


6.6 


0.7 


0.930 


26,467 


5.18 


100 


VMA4 


10.5 


3.7 


0.427 


46,679 


6.39 


52 


ENOl 


2.2 


0.7 


0.930 


26,661 


5.84 


98 


TPIl 


NA'' 


NA*" 


0.900 


46,773 


5.82 


63 


EN02 


15.5 


289.1 


0.960 


27,156 


5.56 


93 


PRE8 


6,9 


0.7 


0.129 


46,773 


5.82 


64 


EN02 


635.5 


289.1 


0.960 


27,334 


6,13 


115 


YHR049W 


18.4 


2.2 


0.520 


46,773 


5.82 


65 


EN02 


93.0 


289.1 


0.960 


27,472 


5.33 


92 


YNLOlOW 


31.6 


3.7 


0,421 


46,773 


5.82 


66 


EN02 


31.0 


289.1 


0.960 


27,480 


8.95 


123 


GPMl 


10,0 


169.4 


0.902 


47,402 


6.09 


126 


CORl 


2.5 


0.7 


0.422 


27,480 


8.95 


124 


GPMl 


231.4 


169.4 


0.902 


47,666 


8.98 


54 


AAT2 


11.7 


6.0 


0.338 


27,480 


8.95 


125 


GPMl 


7.5 


169.4 


0.902 


48,364 


5.25 


73 


WTMl 


74.5 


13.4 


0.365 


27,809 


5-97 


139 


HOR2 


5,7 


0.7 


0.381 


48,530 


6.20 


61 


MET17 


38.1 


29.0 


0.576 


27,874 


4.46 


78 


YSTl 


13.6 


52.8 


0.805 


48,904 


5.18 


69 


LYS9 


16.2 


3.7 


0.463 


28,595 


4.51 


41 


PUP2 


4,4 


0.7 


0.147 


48,987 


4.90 


153 


SUP45 


29.6 


11.9 


0.377 


29,156 


6.59 


114 


YMR226C 


14.5 


2.2 


0.283 


49,727 


5.47 


70 


PR02 


13.6 


5.2 


0.297 


29,244 


8.40 


120 


DPMI 


5.0 


11.2 


0.362 


49,912 


9.27 


62 


TEF2 


558.5 


282.0 


0.932 


29,443 


5.91 


48 


PRE4 


3.4 


3.7 


0.162 


50,444 


5.67 


35 


YDR190C 


4.8 


2.2 


0.228 


30,012 


6.39 


138 


PRBl 


21.2 


1.5 


0,449 


50,837 


6.11 


32 


YEL047C 


3.8 


1.5 


0.387 


30,073 


4.63 


77 


BMHl 


14,7 


28.2 


0.454 


50,891 


4.59 


151 


TUB2 


11,2 


7.4 


0.404 


30,296 


7.94 


121 


0MP2 


67.4 


41.6 


0.499 


51,547 


6.80 


27 


LPDl 


18.9 


2.2 


0,351 


30,435 


6.34 


89 


GPPl 


70.2 


11.2 


0.703 


52,216 


7.25 


29 


SHM2 


19.7 


7.4 


0.722 


31,332 


5.57 


88 


ILV6 


13.9 


3.0 


0,402 


52,859 


5.54 


37 


YFR044C 


30.2 


6.7 


0.442 


32,159 


5.46 


113 


IPPl 


63,1 


3.7 


0.752 


53,798 


5.19 


71 


HXK2 


26.5 


7.4 


0.756 


32,263 


6.00 


149 


HISl 


22.4 


4.5 


0.232 


53,803 


6.05 


145 


GYP6 


4.4 


0.7 


0.147 


33,311 


5.35 


84 


SPE3 


15,1 


6.7 


0.468 


54,403 


5.29 


39 


ALD6 


37.7 


2.2 


0.664 


34,465 


5.60 


129 


ADEl 


8,7 


5.2 


0.305 


54,403 


5.29 


40 


ALD6 


6.6 


2.2 


0.664 


34,762 


5.32 


85 


SEC14 


10,9 


6.0 


0.373 


54,502 


6.20 


31 


ADE13 


6.3 


1.5 


0.417 


34,797 


5.85 


42 


URAl 


49.5 


8.9 


0.237 


54,543 


7.75 


25 


PYKl 


225.3 


101.8 


0.965 


34,799 


6.04 


90 


BELl 


103.2 


81.0 


0.875 


54,543 


7.75 


26 


PYKl 


39.8 


101.8 


0.965 


35,556 


5.97 


43 


YDL124W 


6.4 


4.5 


0.206 


55,221 


6.66 


146 


YEL071W 


16.3 


3.0 


0.244 


35,619 


8.41 


59 


TDHl 


69,8 


32.7^^ 


0.940 


55,295 


4.35 


134 


PDll 


66.2 


14.1 


0.589 


35,650 


5.49 


68 


CARl 


5,2 


3.0 


0.339 


55,364 


5.98 


24 


GLKl 


22.6 


6.0 


0.237 


35,712 


6.72 


117 


TDH2 


49.6 


473.0^ 


0.982 


55,481 


7.97 


118 


ATPl 


21.6 


2.2 


0.637 


35,712 


6.72 


154 


TDH2 


863.5 


473.0^ 


0.982 


55,886 


6.47 


28 


CYS4 


22.2 


NA"" 


0.444 


35,712 


6.72 


155 


TDH2 


79.4 


473.0^ 


0.982 


56,167 


5.83 


33 


AR08 


14.3 


3.0 


0.324 


36,272 


4.85 


128 


APAl 


8.7 


0.7 


0.425 


56,167 


5.83 


34 


AR08 


9.1 


3.0 


0.324 


36,358 


5.05 


75 


YJR105W 


17.6 


17.1 


0.522 


56,584 


6.36 


20 


CYB2 


18.9 


NA"" 


0.259 


36 358 


5.05 


76 


YTRinSW 

1 J IX 1 \J,J TV 


27,5 


17.1 


0.522 


57 366 


5.53 


60 


FRS2 


2.3 


0.7 


0.451 


36^596 


6.37 


79 


ADH2 


58,9 


260.0^ 


0.711 


57^383 


5.98 


144 


ZWFl 


5.6 


0^7 


0.215 


36,714 


6.30 


102 


ADHl 


746.1 


260.0 


0.913 


57,464 


5.49 


36 


THR4 


21.4 


3.7 


0.508 


36,714 


6.30 


103 


ADHl 


17.6 


260.0 


0.913 


57,512 


5.50 


7 


SRV2 


6.5 


NA^ 


0.260 


36,714 


6.30 


104 


ADHl 


61.4 


260.0 


0.913 


57,727 


4.92 


152 


VMA2 


33.7 


8.9 


0.546 


36,714 


6.30 


105 


ADHl 


52,7 


260.0 


0.913 


58,573 


6.47 


17 


ACHl 


4.4 


1.5 


0.327 


37,033 


6.23 


44 


TALI 


44.8 


3.7 


0.701 


58,573 


6.47 


18 


ACHl 


5.4 


1.5 


0.327 


37,796 


7.36 


57 


IDH2 


29,4 


6.7 


0.330 


61,353 


5.87 


21 


PDCl 


6.5 


200.7 


0.962 


37,886 


6.49 


106 


ILV5 


76.0 


4.5 


0.892 


61,353 


5.87 


22 


PDCl 


303.2 


200.7 


0.962 


38,700 


7.83 


55 


BATl 


30.9 


11.2 


0.469 


61,353 


5.87 


23 


PDCl 


16.3 


200.7 


0.962 


38,702 


6.24 


46 


QCR2 


NA'' 


2.2 


0.326 


61,649 


5,54 


38 


CCT8 


2.2 


1.5 


0.271 
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TABLE ^—Continued 



Mol wt 


Pl 


Spot no. 


I r u gciiw 

name" 


Protein 

(10^ copies/ 
cell) 


mRNA 
abundance 
(copies/cell) 


Codon 
bias 


61,902 


6.21 


101 


PDC5 


4.3 


NA^ 


0.828 


62,266 


6.19 


16 


ICLl 


20.1 


NA^ 


0.327 


62,862 


8.02 


19 


ILV3 


5.3 


4.5 


0.548 


63,082 


6.40 


119 


PGM2 


2.2 


3.0 


0.402 


64,335 


5.77 


5 


PABl 


30.4 


1.5 


0.616 


66,120 


5.42 


8 


STIl 


6.7 


0.7 


0.313 


66,120 


5.42 


9 


STIl 


6.4 


0.7 


0.313 


66,450 


5.29 


141 


SSB2 


7.0 


NA^ 


0.880 


66,450 


5.29 


142 


SSB2 


2.3 


NA^ 


0.880 


66,456 


5.23 


10 


SSBl 


64.5 


79.5 


0.907 


66,456 


5.23 


11 


SSBl 


59.0 


79.5 


0.907 


66,456 


5.23 


12 


SSBl 


13.7 


79.5 


0.907 


68,397 


5.82 


82 


LEU4 


3.1 


3.0 


0.407 


69,313 


4.90 


13 


SSA2 


24.3 


18.6 


0.892 


69,313 


4.90 


14 


SSA2 


77.1 


18.6 


0.892 


74,378 


8.46 


15 


YKL029C 


2.8 


3.7 


0.353 


75,396 


5.82 


6 


GRSl 


5.5 


7.4 


0.500 


85,720 


6.25 


1 


MET6 


2.0 


NA*^ 


0.772 


85,720 


6.25 


2 


MET6 


10.9 


NA^ 


0.772 


85,720 


6.25 


3 


MET6 


1.4 


NA*^ 


0.772 


93,276 


6.11 


131 


EFTl 


17.9 


41.6 


0.890 


93,276 


6.11 


132 


EFTl 


5.7 


41.6 


0.890 


102,064^ 


6.6r 


94 


ADE3 


4.8 


5.2 


0.423 


107,482'^ 


5.33*^ 


4 


MCM3 


2.7 


NA'^ 


0.240 



" YPD gene names are available from the YPD website (39). 

NA, calculation could not be performed or was not available. 
^ mRNA data inconclusive or NA. 

No methionines in predicted ORF; therefore, protein concentration was not 
determined. 

Measured molecular weight or pi did not match theoretical molecular weight 
or pi. 



Protein quantitation. [^^S]methionine-labeled gels were exposed to X-ray film 
overnight, and then the silver stain and film were used to excise 156 spots of 
varying intensities, molecular weights, and pis. The excised spots were placed in 
0.6-ml microcentrifuge tubes, and scintillation cocktail (100 p.1) was added. The 
samples were vortexed and counted. In addition, two parallel gels were electro- 
blotted to polyvinylidene diftuoride membranes. The membranes were exposed 
to X-ray film, and four Intense single spots were excised from each membrane 
and subjected to amino acid analysis. For these four spots, a mean of 209 ± 4 
cpm/pmol of prote in/me thionine was found. This number was used to quantitate 
all remaining spots in conjunction with the number of methionines present in the 
protein. 

To ensure that proteins were labeled to equilibrium, parallel 2D gels were 
prepared and run on yeast metabolically labeled for 1, 2, 6, or 18 h. The 
corresponding 156 spots were excised from each gel, and radioactivity was mea- 
sured by liquid scintillation counting for each spot. Calculated protein levels were 
highly reproducible for all time points measured after 1 h. 

Calculation of codon bias and predicted half-life. Codon bias values were 
extracted from the YPD spreadsheet (17). Protein half-lives were calculated 
based on the N-end rule (33). When the N-terminal processing was not known 
experimentally, it was predicted based on the aflinity of methionine aminopep- 
tldase (31). 

RESULTS 

Characteristics of proteome approach. Nearly every facet of 
proteome analysis hinges on the unambiguous identification of 
large numbers of expressed proteins in ceils. Several tech- 
niques have been described previously for the identification of 
proteins separated by 2DE, including N-terminal and internal 
sequencing (1, 2), amino acid analysis (38), and more recently 
mass spectrometry (25). We utilized techniques based on mass 
spectrometry because they afford the highest levels of sensitiv- 
ity and provide unambiguous identification. The specific pro- 
cedure used is schematically illustrated in Fig, 1 and is based 
on three principles. First, proteins are removed from the gel by 



proteolytic in-gel digestion, and the resulting peptides are sep- 
arated by on-line capillary high-performance liquid chromatog- 
raphy. Second, the eluting peptides are ionized and detected, and 
the specific peptide ions are selected and fragmented by the 
mass spectrometer. To achieve this, the mass spectrometer 
switches between the MS mode (for peptide mass identifica- 
tion) and the MS/MS mode (for peptide characterization and 
sequencing). Selected peptides are fragmented by a process 
called collision-induced dissociation (CID) to generate a tan- 
dem mass spectrum (MS/MS spectrum) that contains the pep- 
tide sequence information. Third, individual CID mass spectra 
are then compared by computer algorithms to predicted spec- 
tra from a sequence database. This results in the identification 
of the peptide and, by association, the protein(s) in the spot. 
Unambiguous protein identification is attained in a single anal- 
ysis by the detection of multiple peptides derived from the 
same protein. 

Protein identification. Yeast total cell protein lysate (40 (xg), 
metabolically labeled with p^S]methionine, was electro- 
phoretically separated by isoelectric focusing in the first dimen- 
sion and by SDS-10% polyacrylamide gel electrophoresis in 
the second dimension. Proteins were visualized by silver stain- 
ing and by autoradiography. Of the more than 1,000 proteins 
visible by silver staining, 156 spots were excised from the gel 
and subjected to in-gel tryptic digestion, and the resulting 
peptides were analyzed and identified by microspray LC- 
MS/MS techniques as described above. The proteins in this 
study were all identified automatically by computer software 
with no human interpretation of mass spectra. They are indi- 
cated in Fig. 2 and detailed in Table 1. 

The CID spectra shown in Fig. 3 indicate that the quality of 
the identification data generated was suitable for unambiguous 
protein identification. The spectra represent the amino acid 
sequences of tryptic peptides NSGDIVNLGSIAGR (Fig. 3A) 
and FAVGAFTDSLR (Fig. 3B). Both peptides were derived 
from protein S57593 (hypothetical protein YMR226C), which 
migrated to spot 114 (molecular weight, 29,156; pi, 6.59) in the 
2D gel in Fig. 2. Five other peptides from the same analysis 
were also computer matched to the same protein sequence. 

Protein and mRNA quantitation. For the 156 genes investi- 
gated, the protein expression levels ranged from 2,200 (PGM2) 
to 863,000 (TDH2/TDH3) copies/cell. The levels of mRNA for 
each of the genes identified were calculated from SAGE fre- 
quency tables (35). These tables contain the mRNA levels for 
4,665 genes in yeast strain YPH499 grown to mid-log phase in 
YPD medium on glucose as a carbon source. In some in- 
stances, the mRNA levels could not be calculated for reasons 
stated in Materials and Methods. For the proteins analyzed in 
this study, mean transcript levels varied from 0.7 to 473 copies/ 
cell. 

Selection of the sample population for mRNA-protein ex- 
pression level correlation. The protein spots selected for iden- 
tification were selected from spots visible by silver staining in 
the 2D gel. An attempt was made not to include spots where 
overlap with other spots was readily apparent. The number of 
proteins identified was 156 (Table 1). Some proteins migrated 
to more than one spot (presumably due to differential protein 
processing or modifications), and protein levels from these 
spots were calculated by integrating the intensities of the dif- 
ferent spots. The 156 protein spots analyzed represented the 
products of 128 different genes. Genes were excluded from the 
correlation analysis only if part of the data set was missing; i.e., 
genes were excluded if (i) no mRNA expression data were 
available for the protein or putative SAGE tags were ambig- 
uous, (ii) the amino acid sequence did not contain methionine, 
(iii) more than a single protein was conclusively identified as 
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FIG. 3. Tandem mass (MS/MS) spectra resulting from analysis of a single spot on a 2D gel. The first quadrupole selected a single mass-to-charge ratio (m/z) of 687.2 
(A) or 592.6 (B), while the collision cell was filled with argon gas, and a voltage which caused the peptide to undergo fragmentation by CID was applied. The third 
quadrupole scanned the mass range from 50 to 1,400 m/r. The computer program Sequest (8) was utilized to match MS/MS spectra to amino acid sequence by database 
searching. Both spectra matched peptides from the same protein, S57593 (yeast hypothetical protein YMR226C). Five other peptides from the same analysis were 
matched to the same protein. 



migrating to the same gel spot, or (iv) the theoretical and 
observed pis and molecular weights could not be reconciled. 
After these criteria were applied, the number of genes used in 
the correlation analysis was 106. 



Codon bias and predicted half-lives. Codon bias is thought 
to be an indicator of protein expression, with highly expressed 
proteins having large codon bias values. The codon bias distri- 
bution for the entire set of more than 6,000 predicted yeast 
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gene ORFs is presented in Fig. 4A. The interval with the 
largest frequency of genes is between the codon bias values of 
0.0 and 0.1. This segment contains more than 2,500 genes. The 
distribution of the codon bias values of the 128 different genes 
found in this study (all protein spots from Fig. 2) is shown in 
Fig. 4B, and protein half-lives (predicted from applying the 
N-end rule [33] to the experimentally determined or predicted 
protein N termini) are shown in Fig. 4C. No genes were iden- 
tified with codon bias values less than 0.1 even though thou- 
sands of genes exist in this category. In addition, nearly all of 
the proteins identified had long predicted half-lives (greater 
than 30 h). 

Correlation of mRNA and protein expression levels. The 
correlation between mRNA and protein levels of the genes 
selected as described above is shown in Fig. 5. For the entire 
group (106 genes) for which a complete data set was gener- 
ated, there was a general trend of increased protein levels 
resulting from increased mRNA levels. The Pearson product 
moment correlation coefficient for the whole data set (106 
genes) was 0.935. This number is highly biased by a small 
number of genes with very large protein and message levels. A 
more representative subset of the data is shown in the inset of 
Fig. 5. It shows genes for which the message level was below 10 
copies/cell and includes 69% (73 of 106 genes) of the data used 
in the study. The Pearson product moment correlation coeffi- 
cient for this data set was only 0.356. We also found that levels 
of protein expression coded for by mRNA with comparable 
abundance varied by as much as 30-fold and that the mRNA 
levels coding for proteins with comparable expression levels 
varied by as much as 20-fold. 

The distortion of the correlation value induced by the un- 
even distribution of the data points along the x axis is further 
demonstrated by the analysis in Fig. 6. The 106 samples in- 
cluded in the study were ranked by protein abundance, and the 
Pearson product moment correlation coefficient was repeat- 
edly calculated after including progressively more, and higher- 
abundance, proteins in each calculation. The correlation values 
remained relatively stable in the range of 0.1 to 0.4 if the 
lowest-expressed 40 to 95 proteins used in this study were 
included. However, the correlation value steadily climbed by 
the inclusion of each of the 11 very highly expressed proteins. 

Correlation of protein and mRNA expression levels with 
codon bias. Codon bias is the propensity for a gene to utilize 
the same codon to encode an amino acid even though other 
codons would insert the identical amino acid in the growing 
polypeptide sequence. It is further thought that highly ex- 
pressed proteins have large codon biases (3). To assess the 
value of codon bias for predicting mRNA and protein levels in 
exponentially growing yeast cells, we plotted the two experi- 
mental sets of data versus the codon bias (Fig. 7). The distri- 
bution patterns for both mRNA and protein levels with respect 
to codon bias were highly similar. There was high variability in 
the data within the codon bias range of 0.8 to 1.0. Although a 
large codon bias generally resulted in higher protein and mes- 
sage expression levels, codon bias did not appear to be predic- 
tive of either protein levels or mRNA levels in the cell. 

DISCUSSION 

The desired end point for the description of a biological 
system is not the analysis of mRNA transcript levels alone but 
also the accurate measurement of protein expression levels and 
their respective activities. Quantitative analysis of global 
mRNA levels currently is a preferred method for the analysis 
of the state of cells and tissues (11). Several methods which 
either provide absolute mRNA abundance (34, 35) or relative 
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FIG. 4. Current proteome analysis technology utilizing 2DE without preen- 
richment samples mainly highly expressed and long-lived proteins. Genes encod- 
ing highly expressed proteins generally have large codon bias values. (A) Distri- 
bution of the yeast genome (more than 6,000 genes) based on codon bias. The 
interval with the largest frequency of genes is 0.0 to 0.1, with more than 2,500 
genes. (B) Distribution of the genes from identified proteins in this study based 
on codon bias. No genes with codon bias values less than 0.1 were detected in this 
study, (C) Distribution of identified proteins in this study based on predicted 
half-life (estimated by N-end rule). 



mRNA levels in comparative analyses (20, 27) have been de- 
scribed elsewhere. The techniques are fast and exquisitely sen- 
sitive and can provide mElNA abundance for potentially any 
expressed gene. Measured mRNA levels are often implicitly or 
explicitly extrapolated to indicate the levels of activity of the 
corresponding protein in the cell. Quantitative analysis of pro- 
tein expression levels (proteome analysis) is much more time- 
consuming because proteins are analyzed sequentially one by 
one and is not general because analyses are limited to the 
relatively highly expressed proteins. Proteome analysis does, 
however, provide types of data that are of critical importance 
for the description of the state of a biological system and that 
are not readily apparent from the sequence and the level of 
expression of the mRNA transcript. This study attempts to 
examine the relationship between mRNA and protein expres- 
sion levels for a large number of expressed genes in cells 
representing the same state. 

Limits in the sensitivity of current protein analysis technol- 
ogy precluded a completely random sampling of yeast proteins. 
We therefore based the study on those proteins visible by silver 
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FIG. 5. Correlation between protein and mRNA levels for 106 genes in yeast growing at log phase with glucose as a carbon source. mRNA and protein levels were 
calculated as described in Materials and Methods. The data represent a population of genes with protein expression levels visible by silver staining on a 2D gel chosen 
to include the entire range of molecular weights, isoelectric focusing points, and staining intensities. The inset shows the low-end portion of the main figure. It contains 
69% of the original data set. The Peaison product moment correlation for the entire data set was 0.935. The correlation for the inset containing 73 proteins (69%) was 
only 0,356. 



Staining on a 2D gel. Of the more than 1,000 visible spots, 156 
were chosen to include the entire range of molecular weights, 
isoelectric focusing points, and staining intensities displayed on 
the 2D protein pattern. The genes identified in this study 
shared a number of properties. First, all of the proteins in this 
study had a codon bias of greater than 0.1 and 93% were 
greater than 0.2 (Fig. 4B). Second, with few exceptions, the 
proteins in this study had long predicted half-lives according to 
the N-end rule (Fig. 4C). Third, low-abundance proteins with 
regulatory functions such as transcription factors or protein 
kinases were not identified. 

Because the population of proteins used in this study ap- 
pears to be fairly homogeneous with respect to predicted half- 
life and codon bias, it might be expected that the correlation of 
the mRNA and protein expression levels would be stronger for 
this population than for a random sample of yeast proteins. We 
tested this assumption by evaluating the correlation value if 
different subsets of the available data were included in the 
calculation. The 106 proteins were ranked from lowest to high- 
est protein expression level, and the trend in the correlation 
value was evaluated by progressively including more of the 
higher-abundance proteins in the calculation (Fig. 6). The cor- 
relation value when only the lower-abundance 40 to 93 pro- 
teins were examined was consistently between 0.1 and 0.4. If 
the 11 most abundant proteins were included, the correlation 
steadily increased to 0.94. We therefore expect that the corre- 
lation for all yeast proteins or for a random selection would be 
less than 0.4. The observed level of correlation between 
mRNA and protein expression levels suggests the importance 



of posttranslational mechanisms controlling gene expression. 
Such mechanisms include translational control (15) and con- 
trol of protein half-life (33). Since these mechanisms are also 
active in higher eukaryotic cells, we speculate that there is no 
predictive correlation between steady-state levels of mRNA 
and those of protein in mammalian cells. 

Like other large-scale analyses, the present study has several 
potential sources of error related to the methods used to de- 
termine mRNA and protein expression levels. The mRNA 
levels were calculated from frequency tables of SAGE data. 
This method is highly quantitative because it is based on actual 
sequencing of unique tags from each gene, and the number of 
times that a tag is represented is proportional to the number of 
mRNA molecules for a specific gene. This method has some 
limitations including the following: (i) the magnitude of the 
error in the measurement of mRNA levels is inversely propor- 
tional to the mRNA levels, (ii) SAGE tags from highly similar 
genes may not be distinguished and therefore are summed, (iii) 
some SAGE tags are from sequences in the 3' untranslated 
region of the transcript, (iv) incomplete cleavage at the SAGE 
tag site by the restriction enzyme can result in two tags repre- 
senting one mRNA, and (v) some transcripts actually do not 
generate a SAGE tag (34, 35). 

For the SAGE method, the error associated with a value 
increases with a decreasing number of transcripts per cell. The 
conclusions drawn from this study are dependent on the qual- 
ity of the mRNA levels from previously published data (35). 
Since more than 65% of the mRNA levels included in this 
study were calculated to 10 copies/cell or less (40% were less 
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FIG. 6. Effect of highly abundant proteins on Pearson product moment correlation coefficient for mRNA and protein abundance in yeast. The set of 106 genes was 
ranked according to protein abundance, and the correlation value was calculated by including the 40 lowest-abundance genes and then progressively including the 
remaining 66 genes in order of abundance. The correlation value climbs as the final 1 1 highly abundant proteins are included. 



than 4 copies/cell), the error associated with these values may 
be quite large. The mRNA levels were calculated from more 
than 20,000 transcripts. Assuming that the estimate of 15,000 
mRNA molecules per cell is correct (16), this would mean that 
mRNA transcripts present at only a single copy per cell would 
be detected 72% of the time (35). The mRNA levels for each 
gene were carefully scrutinized, and only mRNA levels for 
which a high degree of confidence existed were included in the 
correlation value. 

Protein abundance was determined by metabolic radiolabel- 
ing with [-^^Slmethionine. The calculation required knowledge 
of three variables: the number of methionines in the mature 
protein, the radioactivity contained in the protein, and the 
specific activity of the radiolabel normalized per methionine. 
The number of methionines per protein was determined from 
the amino acid sequence of the proteins identified by tandem 
mass spectrometry. For some proteins, it was not known 
whether the methionine of the nascent polypeptide was pro- 
cessed away. The N termini of those proteins were predicted 
based on the specificity of methionine aminopeptidase (31). If 
the N-terminal processing did not conform to the predicted 
specificity of processing enzymes, the calculation of the num- 
ber of methionines would be affected. This discrepancy would 
affect most the quantitation of a protein with a very low num- 
ber of methionines. The average number of calculated methi- 
onines per protein in this study was 7.2. We therefore expect 
the potential for erroneous protein quantitation due to un- 
usual N-terminal processing to be small. 



The amount of radioactivity contained in a single spot might 
be the sum of the radioactivity of comigrating proteins. Be- 
cause protein identification was based on tandem mass spec- 
trometric techniques, comigrating proteins could be identified. 
However, comigrating proteins were rarely detected in this 
study, most likely because relatively small amounts of total 
protein (40 ^jig) were initially loaded onto the gels, which re- 
sulted in highly focused spots containing generally 1 to 25 ng of 
protein. Because of the relatively small amount loaded, the 
concentrations of any potentially comigrating protein would 
likely be below the limit of detection of the mass spectrometry 
technique used in this study (1 to 5 ng) and below the limit of 
visualization by silver staining (1 to 5 ng). In the overwhelming 
majority of the samples analyzed, numerous peptides from a 
single protein were detected. It is assumed that any comigrat- 
ing proteins were at levels too low to be detected and that their 
influence in the calculation would be small. 

The specific activity of the radiolabel was determined by 
relating the precise amount of protein present in selected spots 
of a parallel gel, as determined by quantitative amino acid 
composition analysis, to the number of methionines present in 
the sequence of those proteins and the radioactivity deter- 
mined by liquid scintillation counting. It is possible that the 
resulting number might be influenced by unavoidable losses 
inherent in the amino acid analysis procedure applied. Because 
four different proteins were utilized in the calculation and the 
experiment was done in duplicate, the specific activity calcu- 
lated is thought to be highly accurate. Indeed, the specific 



Vol. 19, 1999 



CORRELATION BETWEEN PROTEIN AND mRNA LEVELS IN YEAST 1729 



0.0 



o [protein] 
A [mRNA] 



o 



A 

o 



o o 

o 



*• ot 

o 







0.2 



0.4 



0.6 

Codon Bias 



0.8 



1.0 



FIG. 7. Relationship between codon bias and protein and mRNA levels in this study. Yeast mRNA and protein expression levels were calculated as described in 
Materials and Methods. The data represent the same 106 genes as in Fig. 5. 



activities calculated for each of the four proteins varied by less 
than 10%. Any inconsistencies in the calculation of the specific 
activity would result in differences in the absolute levels calcu- 
lated but not in the relative numbers and would therefore not 
influence the correlation value determined. 

The protein quantitative method used eliminates a number 
of potential errors inherent in previous methods for the quan- 
titation of proteins separated by 2DE, such as preferential 
protein staining and bias caused by inequalities in the number 
of radiolabeled residues per protein. Any 2D gel-based method 
of quantitation is complicated by the fact that in some cases the 
translation products of the same mRNA migrated to different 
spots. One major reason is posttranslational modification or 
processing of the protein. Also, artifactual proteolysis during 
cell lysis and sample preparation can lead to multiple resolved 
forms of the protein. In such cases, the protein levels of spots 
coded for by the same mRNA were pooled. In addition, the 
existence of other spots coded for by the same mRNA that 
were not analyzed by mass spectrometry or that were below the 
limit of detection for silver staining cannot be ruled out. How- 
ever, since this study is based on a class of highly expressed 
proteins, the presence of undetected minor spots below silver 
staining sensitivity corresponding to a protein analyzed in the 
study would generally cause a relatively small error in protein 
quantitation. 

Codon bias is a measure of the propensity of an organism to 
selectively utilize certain codons which result in the incorpo- 
ration of the same amino acid residue in a growing polypeptide 
chain. There are 61 possible codons that code for 20 amino 
acids. The larger the codon bias value, the smaller the number 
of codons that are used to encode the protein (19). It is 



thought that codon bias is a measure of protein abundance 
because highly expressed proteins generally have large codon 
bias values (3, 13). 

Nearly all of the most highly expressed proteins had codon 
bias values of greater than 0.8. However, we detected a number 
of genes with high codon bias and relative low protein abun- 
dance (Fig. 7). For example, the expressed gene with both the 
second largest protein and mRNA levels in the study was 
EN02_ YEAST (775,000 and 289.1 copies/cell, respectively). 
ENOI YEAST was also present in the gel at much lower 
protein and mRNA levels (44,200 and 0.7 copies/cell, respec- 
tively). The codon bias values for EN02 and ENOl are similar 
(0.96 and 0.93, respectively), but the expression of the two 
genes is differentially regulated. Specifically, EN01_YEAST is 
glucose repressed (6) and was therefore present in low abun- 
dance under the conditions used. Other genes with large codon 
bias values that were not of high protein abundance in the gel 
include EFTl, TIFl, HXK2, GSPl, EGD2, SHM2, and TALI. 
We conclude that merely determining the codon bias of a gene 
is not sufficient to predict its protein expression level. 

Interestingly, codon bias appears to be an excellent indicator 
of the boundaries of current 2D gel proteome analysis tech- 
nology. There are thousands of genes with expressed mRNA 
and likely expressed protein with codon bias values less than 
0.1 (Fig. 4 A). In this study, we detected none of them, and only 
a very small percentage of the genes detected in this study had 
codon bias values between 0.1 and 0.2 (Fig. 4B). Indeed, in 
every examined yeast proteome study (5, 7, 13, 28) where the 
combined total number of identified proteins is 300 to 400, this 
same observation is true. It is expected that for the more 
complex cells of higher eukaryotic organisms the detection of 
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low-abundance proteins would be even more challenging than 
for yeast. This indicates that highly abundant, long-lived pro- 
teins are overwhelmingly detected in proteome studies. If pro- 
teome analysis is to provide truly meaningful information 
about cellular processes, it must be able to penetrate to the 
level of regulatory proteins, including transcription factors and 
protein kinases. A promising approach is the use of narrow- 
range focusing gels with immobilized pH gradients (IPG) (23). 
This would allow for the loading of significantly more protein 
per pH unit covered and also provide increased resolution of 
proteins with similar electrophoretic mobilities. A standard pH 
gradient in an isoelectric focusing gel covers a 7-pH-unit range 
(pH 3 to 10) over 18 cm. A narrow-range focusing gel might 
expand the range to 0.5 pH units over 18 cm or more. This 
could potentially increase by more than 10-fold the number of 
proteins that can be detected. Clearly, current proteome tech- 
nology is incapable of analyzing low-abundance regulatory pro- 
teins without employing an enrichment method for relatively 
low-abundance proteins. In conclusion, this study examined 
the relationship between yeast protein and message levels and 
revealed that transcript levels provide little predictive value 
with respect to the extent of protein expression. 
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